-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colmem: improve memory-limiting behavior of the accounting helpers #85440
Conversation
0e6e2f3
to
4dae132
Compare
bead01c
to
eb420e3
Compare
2d064dc
to
7478d18
Compare
fde831e
to
5e96448
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring/stability improvements!
Reviewed 7 of 7 files at r1, 8 of 8 files at r2, 27 of 27 files at r3, 4 of 4 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/sql/colfetcher/cfetcher.go
line 852 at r2 (raw file):
if cf.machine.limitHint > 0 && cf.machine.rowIdx >= cf.machine.limitHint { // If we made it to our limit hint, so output our batch early to
[nit] "If we" -> "We"
pkg/sql/colfetcher/vectorized_batch_size_test.go
line 153 at r3 (raw file):
for _, tc := range []struct { // numRows and rowSize must be of the same length, with each index // specifying the number of rows of the corresponding size to be
[nit] add that the row size is in bytes.
pkg/sql/colmem/allocator.go
line 567 at r3 (raw file):
// Init initializes the helper. func (h *AccountingHelper) Init(allocator *Allocator, memoryLimit int64) {
[nit] Maybe mention here that the allocator can be shared between different components.
pkg/sql/colmem/allocator.go
line 573 at r3 (raw file):
// disk spilling" scenario, but the helper should ignore that, so we // override it to the default value of the distsql_workmem variable. memoryLimit = 64 << 20 /* 64 MiB */
Would it be better to use GetWorkMemLimit
here? I'm wondering if there could be a case where a user sets the working memory limit to 1
for whatever reason, then gets unexpected results when we revert to the default.
pkg/sql/colmem/allocator.go
line 584 at r3 (raw file):
// yet to be set, use 0 if unknown. func (h *AccountingHelper) ResetMaybeReallocate( typs []*types.T, oldBatch coldata.Batch, remainingTuples int,
[nit] remainingTuples
confused me a bit when I saw this method being called elsewhere; maybe something like tuplesToBeSet
or remainingTuplesToSet
would be clearer?
pkg/sql/colmem/allocator.go
line 591 at r3 (raw file):
// with other components, thus, we cannot ask it directly for the batch // mem size, yet the allocator can provide a useful upper bound.) if batchMemSizeUpperBound := h.allocator.Used(); h.discardBatch(batchMemSizeUpperBound) {
Shouldn't we also be setting h.maxCapacity
to the current batch capacity once we exceed the memory limit? (Rather than only setting it to half the current capacity once we reach double the limit).
pkg/sql/colmem/allocator.go
line 719 at r3 (raw file):
// yet to be set, use 0 if unknown. func (h *SetAccountingHelper) ResetMaybeReallocate( typs []*types.T, oldBatch coldata.Batch, remainingTuples int,
[nit] same as for AccountingHelper
.
pkg/sql/colmem/allocator.go
line 821 at r3 (raw file):
// we update the memorized capacity. If it's the latter, then on the // following call to ResetMaybeReallocate, the batch will be discarded. h.helper.maxCapacity = rowIdx + 1
Should we be doing this in the AccountingHelper
instead?
pkg/sql/colmem/allocator.go
line 713 at r4 (raw file):
// ResetMaybeReallocate is a light wrapper on top of // Allocator.resetMaybeReallocate (and thus has the same contract) with an
[nit] This no longer has exactly the same contract.
pkg/sql/colmem/allocator_test.go
line 412 at r3 (raw file):
// Now the limit has been exceeded by too much again - a new // batch of smaller capacity must be allocated. {1, true, 100},
Could you add one last iteration like {1, false, 1300}
to ensure we never attempt to shrink the capacity to zero?
f92bf34
to
5af8fcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)
pkg/sql/colfetcher/cfetcher.go
line 852 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] "If we" -> "We"
Done.
pkg/sql/colfetcher/vectorized_batch_size_test.go
line 153 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] add that the row size is in bytes.
Done.
pkg/sql/colmem/allocator.go
line 567 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] Maybe mention here that the allocator can be shared between different components.
Done.
pkg/sql/colmem/allocator.go
line 573 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Would it be better to use
GetWorkMemLimit
here? I'm wondering if there could be a case where a user sets the working memory limit to1
for whatever reason, then gets unexpected results when we revert to the default.
We have a similar check in several places. This is a good point though - we should just prohibit the users from setting exactly 1B and reserve that value for internal test use. I'll open up a separate PR for that.
Another note is that I chose to hard-code the number 64MiB instead of using execinfra.DefaultMemoryLimit
in order to not pollute the dependency graph.
pkg/sql/colmem/allocator.go
line 584 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit]
remainingTuples
confused me a bit when I saw this method being called elsewhere; maybe something liketuplesToBeSet
orremainingTuplesToSet
would be clearer?
I like tuplesToBeSet
, done.
pkg/sql/colmem/allocator.go
line 591 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Shouldn't we also be setting
h.maxCapacity
to the current batch capacity once we exceed the memory limit? (Rather than only setting it to half the current capacity once we reach double the limit).
That's a good point. This is not strictly necessary for the correct behavior because Allocator.ResetMaybeReallocate
would never allocate a new batch once it realizes that the old batch has reached the memory limit, but explicitly memorizing the max capacity in such a scenario makes things more clear (as well as avoid computing the batch memory size). Done.
pkg/sql/colmem/allocator.go
line 719 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] same as for
AccountingHelper
.
Done.
pkg/sql/colmem/allocator.go
line 821 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Should we be doing this in the
AccountingHelper
instead?
I think in SetAccountingHelper
we can be more precise since AccountingHelper
operates at a batch granularity. Consider an example when the batch has capacity of 16, but the SetAccountingHelper
realizes that the memory limit is reached only after 3 tuples are set. The AccountingHelper
could memorize 16 as maxCapacity
which would not work well with the expectations of the SetAccountingHelper
- its user should now be setting at most 3 tuples.
The way I think about it is that the AccountingHelper
works at a batch granularity, and SetAccountingHelper
is a specialization of the AccountingHelper
where we work at a tuple granularity, so the latter can make better decisions.
Does this make sense?
pkg/sql/colmem/allocator.go
line 713 at r4 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] This no longer has exactly the same contract.
Done.
pkg/sql/colmem/allocator_test.go
line 412 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Could you add one last iteration like
{1, false, 1300}
to ensure we never attempt to shrink the capacity to zero?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @yuzefovich)
pkg/sql/colmem/allocator.go
line 821 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think in
SetAccountingHelper
we can be more precise sinceAccountingHelper
operates at a batch granularity. Consider an example when the batch has capacity of 16, but theSetAccountingHelper
realizes that the memory limit is reached only after 3 tuples are set. TheAccountingHelper
could memorize 16 asmaxCapacity
which would not work well with the expectations of theSetAccountingHelper
- its user should now be setting at most 3 tuples.The way I think about it is that the
AccountingHelper
works at a batch granularity, andSetAccountingHelper
is a specialization of theAccountingHelper
where we work at a tuple granularity, so the latter can make better decisions.Does this make sense?
Yeah, SGTM.
I ran microbenchmarks of
This PR will benefit from an extra set of eyes since I want to backport this to 22.1, so I'll wait for @michae2 to take a look too. |
@michae2 will you have time to take a look at this this week - that's ok if not? I'd like to merge it before stability, and I'm off on Th-F. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I'm still reading.
This commit introduces a new heuristic as follows:
the first time a batch exceeds the memory limit, its capacity is memorized,
and from now on that capacity will determine the upper bound on the
capacities of the batches allocated through the helper;
if at any point in time a batch exceeds the memory limit by at least a
factor of two, then that batch is discarded, and the capacity will never
exceed half of the capacity of the discarded batch;
if the memory limit is not reached, then the behavior of the dynamic growth
of the capacity provided by Allocator.ResetMaybeReallocate is still
applicable (i.e. the capacities will grow exponentially until
coldata.BatchSize()).Note that this heuristic does not have an ability to grow the maximum
capacity once it's been set although it might make sense to do so (say,
if after shrinking the capacity, the next five times we see that the
batch is using less than half of the memory limit). This is an conscious
omission since I want this change to be backported, and never growing
seems like a safer choice. Thus, this improvement is left as a TODO.
Sounds a lot like TCP congestion control. Maybe once we've hit the memory limit, we could use additive increase / multiplicative decrease instead of staying at that fixed capacity? But that's just an idea, this is already an improvement over what we've got. A TODO is fine. I'll keep reading.
Reviewed 6 of 7 files at r1, 29 of 29 files at r5, 2 of 8 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds a lot like TCP congestion control. Maybe once we've hit the memory limit, we could use additive increase / multiplicative decrease instead of staying at that fixed capacity?
Indeed, I also noticed a similar pattern. However, I realized that here we cannot really apply the AIMD pattern - the cost for that "additive" increase is non trivial since we would have to reallocate the whole underlying memory rather than "adding" a little more. Consider an example when we already have the capacity for 32 rows, and if our additive is 2 rows, then we'd release the whole 32-row batch to allocate 34-row one - that reallocation has non-trivial implications, especially so if we never reach an equilibrium (because then we'd be continuously reallocating memory - we would rather stick to a smaller batch and never reallocate again (unless that batch is extremely small)).
(I guess we could use additive increase when the underlying capacities of the vectors allow that - i.e. say we asked for make([]int64, 10)
, but got a slice of capacity 16 - we could use the additive a few times at no cost. However, that would not be trivial for varlen types likes Bytes.)
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @yuzefovich)
I feel like the 'right' way to solve this in the long term might be to make the various |
It's an interesting idea that'd be worth exploring. We'd need to be conscious about the performance implications though - I'm afraid that introducing those checks would have non-negligible overhead. However, at the same time those conditionals are likely to get very good correct branch prediction most of the time (once the batch has grown to maximum capacity), so maybe the impact would be relatively small. Still we would probably lose the ability to inline most of the |
I think only bytes columns would require any changes in |
Yeah, fair point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this is all correct. Intuitively I feel like there is room for simplification in Allocator.resetMaybeReallocate
and AccountingHelper.ResetMaybeReallocate
but it will require more thought to discover.
Can we let this bake for a little while before backporting?
Reviewed 6 of 8 files at r6, 27 of 27 files at r7, 4 of 4 files at r8, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)
pkg/sql/colexec/columnarizer.go
line 222 at r7 (raw file):
if reallocated { oldRows := c.buffered newRows := make(rowenc.EncDatumRows, c.batch.Capacity())
Should this sometimes be c.helper.maxCapacity
instead of c.batch.Capacity()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews! I agree with the premise that we should be able to find a solution that is both simpler and more precise.
Can we let this bake for a little while before backporting?
For sure - especially so that we'll test this behavior as part of QA next week.
bors r+
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @yuzefovich)
pkg/sql/colexec/columnarizer.go
line 222 at r7 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Should this sometimes be
c.helper.maxCapacity
instead ofc.batch.Capacity()
?
No, I don't think so. In the columnarizer we use the AccountingHelper
which operates at a batch-granularity, so we won't ever have a case when c.helper.maxCapacity < c.batch.Capacity()
- ResetMaybeReallocate
will always give us such a batch that we can use the full capacity. Only when using the SetAccountingHelper
can we get the behavior where the helper tells us that the batch is done before its full capacity is used.
Build failed (retrying...): |
bors r- I think this needs a rebase. |
Canceled. |
This commit is a pure mechanical change. Release note: None
This commit moves the logic that was duplicated across each user of the SetAccountingHelper into the helper itself. Clearly, this allows us to de-duplicate some code, but it'll make it easier to refactor the code which is done in the following commit. Additionally, this commit makes a tiny change to make the resetting behavior in the hash aggregator more precise. Release note: None
This commit fixes an oversight in how we are allocating batches of the "dynamic" capacity. We have two related ways for reallocating batches, and both of them work by growing the capacity of the batch until the memory limit is exceeded, and then the batch would be reused until the end of the query execution. This is a reasonable heuristic under the assumption that all tuples in the data stream are roughly equal in size, but this might not be the case. In particular, consider an example when 10k small rows of 1KiB are followed by 10k large rows of 1MiB. According to our heuristic, we happily grow the batch until 1024 in capacity, and then we do not shrink the capacity of that batch, so once the large rows start appearing, we put 1GiB worth of data into a single batch, significantly exceeding our memory limit (usually 64MiB with the default `workmem` setting). This commit introduces a new heuristic as follows: - the first time a batch exceeds the memory limit, its capacity is memorized, and from now on that capacity will determine the upper bound on the capacities of the batches allocated through the helper; - if at any point in time a batch exceeds the memory limit by at least a factor of two, then that batch is discarded, and the capacity will never exceed half of the capacity of the discarded batch; - if the memory limit is not reached, then the behavior of the dynamic growth of the capacity provided by `Allocator.ResetMaybeReallocate` is still applicable (i.e. the capacities will grow exponentially until coldata.BatchSize()). Note that this heuristic does not have an ability to grow the maximum capacity once it's been set although it might make sense to do so (say, if after shrinking the capacity, the next five times we see that the batch is using less than half of the memory limit). This is an conscious omission since I want this change to be backported, and never growing seems like a safer choice. Thus, this improvement is left as a TODO. Also, we still might create batches that are too large in memory footprint in those places that don't use the SetAccountingHelper (e.g. in the columnarizer) since we perform the memory limit check at the batch granularity. However, this commit improves things there so that we don't reuse that batch on the next iteration and will use half of the capacity on the next iteration. Release note (bug fix): CockroachDB now more precisely respects the `distsql_workmem` setting which improves the stability of each node and makes OOMs less likely.
This commit is a mechanical change to unexport `Allocator.ResetMaybeReallocate` so that the users would be forced to use the method with the same name from the helpers. This required splitting off the tests into two files. Release note: None
bors r+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I realized that here we cannot really apply the AIMD pattern - the cost for that "additive" increase is non trivial since we would have to reallocate the whole underlying memory rather than "adding" a little more. Consider an example when we already have the capacity for 32 rows, and if our additive is 2 rows, then we'd release the whole 32-row batch to allocate 34-row one - that reallocation has non-trivial implications, especially so if we never reach an equilibrium (because then we'd be continuously reallocating memory - we would rather stick to a smaller batch and never reallocate again (unless that batch is extremely small)).
Oh, good point, heh! 🙂
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @michae2, and @yuzefovich)
pkg/sql/colexec/columnarizer.go
line 222 at r7 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
No, I don't think so. In the columnarizer we use the
AccountingHelper
which operates at a batch-granularity, so we won't ever have a case whenc.helper.maxCapacity < c.batch.Capacity()
-ResetMaybeReallocate
will always give us such a batch that we can use the full capacity. Only when using theSetAccountingHelper
can we get the behavior where the helper tells us that the batch is done before its full capacity is used.
Ah, thank you. This clarifies things for me.
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from aec1290 to blathers/backport-release-22.1-85440: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
colmem: introduce a helper method when no memory limit should be applied
This commit is a pure mechanical change.
Release note: None
colmem: move some logic of capacity-limiting into the accounting helper
This commit moves the logic that was duplicated across each user of the
SetAccountingHelper into the helper itself. Clearly, this allows us to
de-duplicate some code, but it'll make it easier to refactor the code
which is done in the following commit.
Additionally, this commit makes a tiny change to make the resetting
behavior in the hash aggregator more precise.
Release note: None
colmem: improve memory-limiting behavior of the accounting helpers
This commit fixes an oversight in how we are allocating batches of the
"dynamic" capacity. We have two related ways for reallocating batches,
and both of them work by growing the capacity of the batch until the
memory limit is exceeded, and then the batch would be reused until the
end of the query execution. This is a reasonable heuristic under the
assumption that all tuples in the data stream are roughly equal in size,
but this might not be the case.
In particular, consider an example when 10k small rows of 1KiB are
followed by 10k large rows of 1MiB. According to our heuristic, we
happily grow the batch until 1024 in capacity, and then we do not shrink
the capacity of that batch, so once the large rows start appearing, we
put 1GiB worth of data into a single batch, significantly exceeding our
memory limit (usually 64MiB with the default
workmem
setting).This commit introduces a new heuristic as follows:
and from now on that capacity will determine the upper bound on the
capacities of the batches allocated through the helper;
factor of two, then that batch is discarded, and the capacity will never
exceed half of the capacity of the discarded batch;
of the capacity provided by
Allocator.ResetMaybeReallocate
is stillapplicable (i.e. the capacities will grow exponentially until
coldata.BatchSize()).
Note that this heuristic does not have an ability to grow the maximum
capacity once it's been set although it might make sense to do so (say,
if after shrinking the capacity, the next five times we see that the
batch is using less than half of the memory limit). This is an conscious
omission since I want this change to be backported, and never growing
seems like a safer choice. Thus, this improvement is left as a TODO.
Also, we still might create batches that are too large in memory
footprint in those places that don't use the SetAccountingHelper (e.g.
in the columnarizer) since we perform the memory limit check at the
batch granularity. However, this commit improves things there so that we
don't reuse that batch on the next iteration and will use half of the
capacity on the next iteration.
Fixes: #76464.
Release note (bug fix): CockroachDB now more precisely respects the
distsql_workmem
setting which improves the stability of each node andmakes OOMs less likely.
colmem: unexport Allocator.ResetMaybeReallocate
This commit is a mechanical change to unexport
Allocator.ResetMaybeReallocate
so that the users would be forced to usethe method with the same name from the helpers. This required splitting
off the tests into two files.
Release note: None